Arabic Font Recognition using Decision Trees Built from Common Words

نویسنده

  • I. S. I. Abuhaiba
چکیده

We present an algorithm for a priori Arabic optical Font Recognition AFR . The basic idea is to recognize fonts of some common Arabic words. Once these fonts are known, they can be generalized to lines, paragraphs, or neighbor non-common words since these components of a textual material almost have the same font. A decision tree is our approach to recognize Arabic fonts. A set of 48 features is used to learn the tree. These features include horizontal projections, Walsh coefficients, invariant moments, and geometrical attributes. A set of 36 fonts is investigated. The overall success rate is 90.8%. Some fonts show 100% success rate. The average time required to recognize the word font is approximately 0.30 seconds.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Arabic Font Recognition Based on Templates

We present an algorithm for a priori Arabic optical Font Recognition (AFR). First, words in the training set of documents for each font are segmented into symbols that are rescaled. Next, templates are constructed, where every new training symbol that is not similar to existing templates is a new template. Templates are sharable between fonts. To classify the font of a word, its symbols are mat...

متن کامل

روشی جدید جهت استخراج موجودیت‌های اسمی در عربی کلاسیک

In Natural Language Processing (NLP) studies, developing resources and tools makes a contribution to extension and effectiveness of researches in each language. In recent years, Arabic Named Entity Recognition (ANER) has been considered by NLP researchers due to a significant impact on improving other NLP tasks such as Machine translation, Information retrieval, question answering, query result...

متن کامل

Modeling phonetic context with non-random forests for speech recognition

Modern speech recognition systems typically cluster triphone phonetic contexts using decision trees. In this paper we describe a way to build multiple complementary decision trees from the same data, for the purpose of system combination. We do this by jointly building the decision trees using an objective function that has an added entropy term to encourage diversity among the decision trees. ...

متن کامل

Open-vocabulary recognition of machine-printed Arabic text using hidden Markov models

In this paper, we present multi-font printed Arabic text recognition using hidden Markov models (HMMs). We propose a novel approach to the sliding window technique for feature extraction. The size and position of the cells of the sliding window adapt to the writing line of Arabic text and ink-pixel distributions. We employ a two-step approach for mixed-font text recognition, in which the input ...

متن کامل

نقد کتاب پژوهشی (ادبیــات) /به فرهنگ باشد روان تندرست: نقدی بر کتاب فرهنگ واره لغات و ترکیبات عربی شاهنامه، هوشنگ محمدی افشار

The latest comprehensive and detailed research on the recognition, description, and the etymology of the Arabic lexicon of Shahnameh is the dictionary of Arabic words and Expressions of Shahnameh, written by Dr. Sajjad Aydanlou. This book is based on the second edition of the Correction of the Khaleghi Motlagh Shahnameh (1393) which is the most authoritative correction and the closest to the or...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CIT

دوره 13  شماره 

صفحات  -

تاریخ انتشار 2005